Discriminative motif analysis of high-throughput dataset
نویسندگان
چکیده
MOTIVATION High-throughput ChIP-seq studies typically identify thousands of peaks for a single transcription factor (TF). It is common for traditional motif discovery tools to predict motifs that are statistically significant against a naïve background distribution but are of questionable biological relevance. RESULTS We describe a simple yet effective algorithm for discovering differential motifs between two sequence datasets that is effective in eliminating systematic biases and scalable to large datasets. Tested on 207 ENCODE ChIP-seq datasets, our method identifies correct motifs in 78% of the datasets with known motifs, demonstrating improvement in both accuracy and efficiency compared with DREME, another state-of-art discriminative motif discovery tool. More interestingly, on the remaining more challenging datasets, we identify common technical or biological factors that compromise the motif search results and use advanced features of our tool to control for these factors. We also present case studies demonstrating the ability of our method to detect single base pair differences in DNA specificity of two similar TFs. Lastly, we demonstrate discovery of key TF motifs involved in tissue specification by examination of high-throughput DNase accessibility data. AVAILABILITY The motifRG package is publically available via the bioconductor repository. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
منابع مشابه
DECOD: fast and accurate discriminative DNA motif finding
MOTIVATION Motif discovery is now routinely used in high-throughput studies including large-scale sequencing and proteomics. These datasets present new challenges. The first is speed. Many motif discovery methods do not scale well to large datasets. Another issue is identifying discriminative rather than generative motifs. Such discriminative motifs are important for identifying co-factors and ...
متن کاملA Feature-Based Approach to Modeling Protein–DNA Interactions
Transcription factor (TF) binding to its DNA target site is a fundamental regulatory interaction. The most common model used to represent TF binding specificities is a position specific scoring matrix (PSSM), which assumes independence between binding positions. However, in many cases, this simplifying assumption does not hold. Here, we present feature motif models (FMMs), a novel probabilistic...
متن کاملFinding HCV NS5A Discriminative Motifs for Assessement of IFN/Ribavarin Therapy Effect
The objective of this paper is twofold. On one hand, it aims to develop two algorithms of discriminative motif discovery from a small labeled dataset and a large unlabeled dataset. One uses exhaustive search for motifs of type ‘discriminative one occurrence per sequence’ and the other uses separate-and-conquer search for motifs of type ‘discriminative multiple occurrences per sequence’. One the...
متن کاملGenome-wide inference of protein interaction sites: lessons from the yeast high-quality negative protein–protein interaction dataset
High-throughput studies of protein interactions may have produced, experimentally and computationally, the most comprehensive protein-protein interaction datasets in the completely sequenced genomes. It provides us an opportunity on a proteome scale, to discover the underlying protein interaction patterns. Here, we propose an approach to discovering motif pairs at interaction sites (often 3-8 r...
متن کاملA general approach for discriminative de novo motif discovery from high-throughput data
De novo motif discovery has been an important challenge of bioinformatics for the past two decades. Since the emergence of high-throughput techniques like ChIP-seq, ChIP-exo and protein-binding microarrays (PBMs), the focus of de novo motif discovery has shifted to runtime and accuracy on large data sets. For this purpose, specialized algorithms have been designed for discovering motifs in ChIP...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 30 6 شماره
صفحات -
تاریخ انتشار 2014